Goto

Collaborating Authors

 xnull 2








A Theoretical Guarantees for FINE Algorithm

Neural Information Processing Systems

This section provides the detailed proof for Theorem 1 and the lower bounds of the precision and recall. We derive such theorems with the concentration inequalities in probabilistic theory. In this section, we frequently use the spectral norm. U = I when U is an orthogonal matrix). A.2 Proof of Theorem 1 We deal with some require lemmas which are used for the proof of Theorem 1. Lemma 1. Lemma 3. (David-Kahan sin Theorem) F or given symmetric matrices A, B R Assume that A and A + B have non-negative eigenvalues.


Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints

arXiv.org Artificial Intelligence

We propose smoothed primal-dual algorithms for solving stochastic and smooth nonconvex optimization problems with linear inequality constraints. Our algorithms are single-loop and only require a single stochastic gradient based on one sample at each iteration. A distinguishing feature of our algorithm is that it is based on an inexact gradient descent framework for the Moreau envelope, where the gradient of the Moreau envelope is estimated using one step of a stochastic primal-dual augmented Lagrangian method. To handle inequality constraints and stochasticity, we combine the recently established global error bounds in constrained optimization with a Moreau envelope-based analysis of stochastic proximal algorithms. For obtaining $\varepsilon$-stationary points, we establish the optimal $O(\varepsilon^{-4})$ sample complexity guarantee for our algorithms and provide extensions to stochastic linear constraints. We also show how to improve this complexity to $O(\varepsilon^{-3})$ by using variance reduction and the expected smoothness assumption. Unlike existing methods, the iterations of our algorithms are free of subproblems, large batch sizes or increasing penalty parameters and use dual variable updates to ensure feasibility.


Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

arXiv.org Machine Learning

Machine learning is a rapidly advancing field with diverse applications across various domains. One prominent area of research is the utilization of deep learning techniques for solving partial differential equations(PDEs). In this work, we specifically focus on employing a three-layer tanh neural network within the framework of the deep Ritz method(DRM) to solve second-order elliptic equations with three different types of boundary conditions. We perform projected gradient descent(PDG) to train the three-layer network and we establish its global convergence. To the best of our knowledge, we are the first to provide a comprehensive error analysis of using overparameterized networks to solve PDE problems, as our analysis simultaneously includes estimates for approximation error, generalization error, and optimization error. We present error bound in terms of the sample size $n$ and our work provides guidance on how to set the network depth, width, step size, and number of iterations for the projected gradient descent algorithm. Importantly, our assumptions in this work are classical and we do not require any additional assumptions on the solution of the equation. This ensures the broad applicability and generality of our results.


Constrained Langevin Algorithms with L-mixing External Random Variables

arXiv.org Artificial Intelligence

Langevin algorithms are gradient descent methods augmented with additive noise, and are widely used in Markov Chain Monte Carlo (MCMC) sampling, optimization, and machine learning. In recent years, the non-asymptotic analysis of Langevin algorithms for non-convex learning has been extensively explored. For constrained problems with non-convex losses over a compact convex domain with IID data variables, the projected Langevin algorithm achieves a deviation of $O(T^{-1/4} (\log T)^{1/2})$ from its target distribution [27] in $1$-Wasserstein distance. In this paper, we obtain a deviation of $O(T^{-1/2} \log T)$ in $1$-Wasserstein distance for non-convex losses with $L$-mixing data variables and polyhedral constraints (which are not necessarily bounded). This improves on the previous bound for constrained problems and matches the best-known bound for unconstrained problems.